Skip to content

fix(ported_static): Approach-1 + stale-skip cleanup for Amsterdam OoG-by-design tests#2843

Open
leolara wants to merge 6 commits into
ethereum:devnets/bal/7from
leolara:wt-bal-7-amsterdam-oog-by-design
Open

fix(ported_static): Approach-1 + stale-skip cleanup for Amsterdam OoG-by-design tests#2843
leolara wants to merge 6 commits into
ethereum:devnets/bal/7from
leolara:wt-bal-7-amsterdam-oog-by-design

Conversation

@leolara
Copy link
Copy Markdown
Member

@leolara leolara commented May 12, 2026

🗒️ Description

Follow-up to #2839 — applies the Approach-1 / Approach-2 strategies
described in .mb/oog-by-design-amsterdam-approaches.md to the
OoG-by-design subset of Amsterdam-skipped ported_static tests.

Clears 74 skip entries (tests/ported_static/amsterdam_skip_list.txt
897 → 823).

Commits

  1. feat(forks): add Fork.oog_budget_lift helper — new classmethod
    on BaseFork that composes sstore_state_gas, create_state_gas,
    and code_deposit_state_gas into a single budget lift for tests
    calibrated to OoG just below an EIP-8037-affected cost boundary.
    Returns 0 on pre-EIP-8037 forks, so callers apply it
    unconditionally without a fork guard.

  2. fix(ported_static): Approach-1 fork-conditional OoG lift
    8 single-spillable-op tests where tx_gas[1] is tuned to barely
    complete a CREATE / SSTORE / CREATE+SSTOREs on Cancun but OoGs on
    Amsterdam due to state-gas spill. Lift the budget with
    Fork.oog_budget_lift(...) using the right
    (creates, sstores, deploy_code_size) counts:

    • stCreate2/test_create2_oo_gafter_init_code.py
    • stCreate2/test_create2_oo_gafter_init_code_returndata2.py
    • stCreateTest/test_create_oo_gafter_init_code.py
    • stCreateTest/test_create_oo_gafter_init_code_returndata2.py
    • stRevertTest/test_revert_sub_call_storage_oog.py
    • stRevertTest/test_revert_sub_call_storage_oog2.py
    • stWalletTest/test_wallet_construction_oog.py
    • stWalletTest/test_multi_owned_construction_not_enough_gas_partial.py
  3. chore(ported_static): remove stale Amsterdam skip entries for stSStoreTest 16-pair family
    22 stSStoreTest files (`sstore_0to*`, `sstore_xto*` excluding
    `gas`, `gas_left`, `change_from_external_call_in_init_code`) had
    3 skip entries each. Re-running these on Amsterdam with the skip
    list disabled shows all 60 fixture variants per file already
    pass without any code changes
    — the post-state assertion
    (`contract_2: storage={1: 0}`) holds whether OoG fires at pair 14
    (Cancun) or pair 5 (Amsterdam after state-gas spill). Remove all
    66 stale entries, no test-file changes.

  4. chore: ruff format oog_budget_lift unit test assertions
    cosmetic format pass on the helper's unit test.

Approach 4 (1024-depth CALL family) — deferred

The 3 `call1024_oog` / `callcode1024_oog` files have a per-frame
state-gas interaction that the simple `oog_budget_lift` model
underestimates (depth dropped from 134→44 frames with a 3×SSTORE
lift, vs expected near-zero drop). They need per-frame analysis
beyond the helper's scope. Leaving them skipped; follow-up PR.

Verification

  • `uv run fill --fork Amsterdam -m "not slow" tests/ported_static/` → 0 failed (16704 passed, 2226 skipped)
  • `uv run fill -m "not slow" tests/ported_static/` (all forks) → 0 failed (60476 passed)
  • `uv run pytest packages/testing/.../test_forks.py::test_oog_budget_lift` → 1 passed
  • `just static` → clean (ruff, mypy, vulture, ethereum-spec-lint, actionlint, codespell)

🔗 Related Issues or PRs

Follows #2839 on the same fork. Independent — can merge in either
order.

✅ Checklist

  • All: Ran `just static` — clean.
  • All: PR title follows the repo standard.
  • All: Considered updating the online docs in ./docs/.
  • All: Set appropriate labels (only maintainers can apply).
  • Tests: Ran `mkdocs serve` to verify auto-generated docs.
  • Tests: post-mortem update (N/A — not implementing a missed test
    case).
  • Ported Tests: `@manually-enhanced` marker added to all eight
    edited tests; `@ported_from` preserved from upstream.

@codecov
Copy link
Copy Markdown

codecov Bot commented May 12, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
⚠️ Please upload report for BASE (devnets/bal/7@a3e5201). Learn more about missing BASE report.

Additional details and impacted files
@@               Coverage Diff                @@
##             devnets/bal/7    #2843   +/-   ##
================================================
  Coverage                 ?   87.35%           
================================================
  Files                    ?      586           
  Lines                    ?    35957           
  Branches                 ?     3382           
================================================
  Hits                     ?    31410           
  Misses                   ?     3926           
  Partials                 ?      621           
Flag Coverage Δ
unittests 87.35% <ø> (?)

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

leolara added 3 commits May 14, 2026 16:31
OoG-by-design ported_static tests are calibrated to land just below
a cost boundary; on Amsterdam each fresh SSTORE-set and CREATE
spills its state-gas portion back into regular gas when the per-tx
reservoir is empty. Tests that expect "N SSTOREs and M CREATEs
complete before OoG" need their gas budget lifted by
N * sstore_state_gas + M * create_state_gas to land at the same
intermediate state.

`Fork.oog_budget_lift(sstores_before_oog=N, creates_before_oog=M)`
composes the two existing state-gas helpers and returns 0 on
pre-EIP-8037 forks (where both helpers are 0), so callers can apply
it unconditionally without a fork guard.

Unit-tested on Cancun (zero) and Amsterdam (cumulative spill).
…*/stRevert*/stWallet* tests

Eight tests share the OoG-by-design shape `tx_gas = [oog_path, success_path]`
where the success_path budget is tuned to barely complete a single
CREATE, a CREATE plus a few SSTOREs, or a deploy chain. On Amsterdam
EIP-8037 splits NEW_ACCOUNT, fresh SSTORE-set, and code-deposit cost
into a regular portion plus a state-gas portion; with an empty
reservoir, the state-gas spills back into regular gas and breaks the
success_path budget.

Apply `Fork.oog_budget_lift` with the right (creates, sstores,
deploy_code_size) counts to lift the budget on Amsterdam only. Pre-
EIP-8037 forks return 0 from the helper, so the original budget is
preserved.

Files (skip-list entries cleared):
- stCreate2/test_create2_oo_gafter_init_code.py (-g1)
- stCreate2/test_create2_oo_gafter_init_code_returndata2.py (-g1)
- stCreateTest/test_create_oo_gafter_init_code.py (-g1)
- stCreateTest/test_create_oo_gafter_init_code_returndata2.py (-g1)
- stRevertTest/test_revert_sub_call_storage_oog.py (-g1-v0)
- stRevertTest/test_revert_sub_call_storage_oog2.py (-g1-v0)
- stWalletTest/test_wallet_construction_oog.py (-g1)
- stWalletTest/test_multi_owned_construction_not_enough_gas_partial.py (-g1)

Removes 8 entries from amsterdam_skip_list.txt.
…eTest 16-pair family

22 stSStoreTest files (`sstore_0to*`, `sstore_xto*` excluding `gas`,
`gas_left`, `change_from_external_call_in_init_code`) had 3 skip
entries each (`d{0,1,2}-g1`). Re-running these on Amsterdam with the
skip list disabled shows all 60 fixture variants per file already
pass without any code changes.

The post-state expectation at `g=1` is `contract_2: storage={1: 0}`
(only slot 1, asserted to be zero). On Cancun, ~14 of the 16
SSTORE pairs complete before OoG; on Amsterdam EIP-8037 the state-
gas spill cuts that to ~5 pairs. In both cases slot 1 ends at 0 and
the final `SSTORE(1, 1)` does not run, so the assertion holds on
both forks unchanged.

These were defensive skip entries from an earlier snapshot. Remove
all 66 entries; no test-file changes needed.
@leolara leolara force-pushed the wt-bal-7-amsterdam-oog-by-design branch from a623fa1 to 1ad2430 Compare May 14, 2026 11:54
Copy link
Copy Markdown
Contributor

@kclowes kclowes left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is looking good @leolara! I may just be missing it, but what is .mb/oog-by-design-amsterdam-approaches.md? Also added a comment about adding the lift to both tx_gas values - let me know what you think!

Bytes(""),
]
tx_gas = [54000, 55000]
tx_gas = [54000, 55000 + fork.oog_budget_lift(creates_before_oog=1)]
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we apply the lift to both tx_gas values? Claude tells me that: Without the lift, tx_gas[0] on Amsterdam OoGs at CREATE2 dispatch before init code runs — the assertions still pass, but the test no longer exercises the scenario it claims to.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think my last commit fixes this? Now I see that the first transaction was ooG in the wrong place

# stripping the fixture-format suffix in conftest.py).
#
# Total entries: 554
# Total entries: 480
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🎉

…ter_init_code tests

Address kclowes's review on ethereum#2843: with only tx_gas[1] lifted, g=0
OoG'd at CREATE/CREATE2 dispatch on Amsterdam (NEW_ACCOUNT state-gas
spill) before init code ever ran — the assertion still held
(`NONEXISTENT` either way) but the failure mode shifted from "OoG
after init code" (the test's named scenario) to "dispatch-time OoG".

A clean closed form using `fork.oog_budget_lift(creates_before_oog=1)`
(183600) overshoots and pushes g=0 past the deploy threshold. The
Cancun 1000-gas gap between g=0 and g=1 collapses on Amsterdam:
once dispatch is cleared, the 5-byte init code is cheap enough to
always complete. Empirical binary-search on both files puts the safe
range at (166499, 167000); 166_750 sits in the middle, keeping g=0
OoG'ing at dispatch and g=1 just clearing the deploy threshold.

The two `_returndata2` variants are left unchanged — g=0's post-state
happens to land identically on both forks at the existing budget, and
adding any lift breaks them.
@leolara
Copy link
Copy Markdown
Member Author

leolara commented May 22, 2026

This is looking good @leolara! I may just be missing it, but what is .mb/oog-by-design-amsterdam-approaches.md? Also added a comment about adding the lift to both tx_gas values - let me know what you think!

the directory .mb works as a memory bank of my agent, where we store findings and things that should know in the future.

In this case the content of this file is:

# OoG-by-design ported_static tests — fix approaches for Amsterdam

After PR #2839 there are still **119 OoG-by-design entries** in
`tests/ported_static/amsterdam_skip_list.txt`. These are tests whose
budgets are tuned to land *just below* an EIP-8037-affected cost
boundary (SSTORE-set, CALL+value to empty account, MEMORY expansion,
nested CALL forwarding), so a blind gas bump invalidates their
assertion and a plain skip loses the coverage.

This note enumerates tractable approaches, ordered by risk and
expected entries cleared.

## Approach 1 — extend `tx_gas` with an Amsterdam threshold (lowest risk)

Tests of shape

```python
tx_gas = [happy_path, just_below_threshold]

where the threshold is "intrinsic + 1 spillable op" can be made
fork-conditional in one line:

tx_gas = [800000, 80000]
if fork.is_eip_enabled(8037):
    # Amsterdam: OoG threshold lifts by the SSTORE-set state-gas
    tx_gas = [800000, 80000 + fork.sstore_state_gas()]

Works when the OoG depends on one spillable operation. Pick-up:
~20–30 entries — simpler stSStoreTest, stCreate2/*_oo_gafter_*,
stWalletTest. Each fix is ≤6 lines.

Approach 2 — derive the bump from the expected post-state

For tests that complete N SSTOREs and then OoG (post-state shows
storage={1:1, …, N:1} with entry N+1 missing), the Amsterdam budget
needs + N × fork.sstore_state_gas() headroom to land at the same
intermediate state. A tiny helper centralises the math:

def oog_budget(base: int, *, sstores_before_oog: int, fork: Fork) -> int:
    """Adjust an OoG-tuned gas budget for EIP-8037 state-gas spill."""
    return base + sstores_before_oog * fork.sstore_state_gas()

Per-test usage stays a single line. Pick-up: ~40–50 entries —
stSStoreTest/test_sstore_0to0*, xto_* family, parts of
stStaticCall/test_static_check_opcodes5,
stRevertTest/test_revert_opcode_multiple_sub_calls. Each fix is
≤8 lines but requires reading the expected post-state to count N.

Approach 3 — fork-conditional expect_section entries

Tests already using resolve_expect_post can get a ">=Amsterdam"
entry with a smaller intermediate state — accept that the OoG fires
earlier on Amsterdam and the recorded state is different. Keeps the
original gas budget intact.

expect_entries_ = [
    {
        "indexes": {"data": -1, "gas": 1, "value": -1},
        "network": ["Cancun"],
        "result": {target: Account(storage={1: 1, 2: 1, 3: 1})},
    },
    {
        "indexes": {"data": -1, "gas": 1, "value": -1},
        "network": [">=Amsterdam"],
        "result": {target: Account(storage={1: 1})},  # OoGs sooner
    },
]

Trades gas-bump complexity for post-state enumeration. Useful where
multiple d parametrizations share a tx_gas list and you can't
bump it without breaking other entries. Pick-up: ~10–15 entries.

Approach 4 — CALL-stack-depth tests (*1024_oog family)

stCallCreateCallCodeTest/test_call1024_oog,
test_callcode1024_oog, stDelegatecallTestHomestead/test_call1024_oog,
test_delegatecall1024_oog recurse 1024 deep and assert OoG when the
call frame hits the depth limit. EIP-8037 doesn't touch stack-depth
mechanics — the failures here are gas exhaustion firing before
depth 1024 because each frame's state-gas spill compounds.

Fix: bump the inner-CALL forwarded gas so each frame survives until
the depth limit fires. Original gas budget stays. Pick-up: ~9 entries.

Approach 5 — accept-and-document the rest

The genuinely stuck cases, with explicit reasons:

  • Pre-EIP-150 gas-pricing assertions — tests like
    stStaticCall/test_static_callcallcodecallcode_011_oogm_after* are
    calibrated against gas costs that no longer match production
    behaviour. EIP-8037 spec tests already cover the new 2D gas model.
  • Memory-expansion + state-gas combinationstMemExpandingEIP150Calls/*
    hits OoG at exact byte counts where the state-gas reservoir changes
    the effective cap; no single-variable fix.
  • Inverse failuresstEIP3860_limitmeterinitcode/*invalid cases:
    Amsterdam allows what was previously rejected; behaviour change, not
    a gas issue.

For these, add a one-line rationale next to the skip entry (or a
section header comment in amsterdam_skip_list.txt) and leave them
as documented coverage gaps.

Recommended sequencing

  1. PR-A: Approach 1 + Approach 4 — mechanical, ≤15 lines per file,
    ~30–40 entries cleared.
  2. PR-B: Approach 2 — introduce the oog_budget helper, sweep
    stSStoreTest. ~50 entries cleared. Helper lives in
    execution_testing.forks or a per-package utils module.
  3. PR-C: Approach 3 — selective use on the harder multi-d tests
    where (2) won't fit. ~10–15 entries.
  4. PR-D: documentation pass — annotate the residue (~15–20
    entries) with skip rationale comments so reviewers know they're
    intentional gaps, not "yet to be triaged".

Endpoint: ~80 stubbornly-skipped tests, each with a documented
reason — vs. the current 554 with one bucket reason for everything.

Per-test inventory (for the PR-A starting point)

Best Approach-1 candidates (single-SSTORE OoG, tx_gas[1] adjustable):

  • stSStoreTest/test_sstore_gas.py
  • stSStoreTest/test_sstore_gas_left.py
  • stCreate2/test_create2_oo_gafter_init_code.py (g1)
  • stCreate2/test_create2_oo_gafter_init_code_returndata2.py (g1)
  • stCreate2/test_create2_oo_gafter_init_code_revert2.py
  • stCreateTest/test_create_oo_gafter_init_code.py
  • stCreateTest/test_create_oo_gafter_init_code_returndata2.py
  • stCreateTest/test_create_oo_gafter_init_code_returndata_size.py
  • stCreateTest/test_create_oo_gafter_init_code_revert2.py
  • stWalletTest/test_day_limit_construction_partial.py
  • stWalletTest/test_wallet_construction_oog.py (g1)
  • stWalletTest/test_wallet_construction_partial.py
  • stWalletTest/test_multi_owned_construction_not_enough_gas_partial.py (g1)
  • stRevertTest/test_revert_sub_call_storage_oog.py (g1)
  • stRevertTest/test_revert_sub_call_storage_oog2.py (g1)
  • stMemoryTest/test_oog.py

Best Approach-4 candidates:

  • stCallCreateCallCodeTest/test_call1024_oog.py (4 entries)
  • stCallCreateCallCodeTest/test_callcode1024_oog.py (2 entries)
  • stDelegatecallTestHomestead/test_call1024_oog.py (2 entries)
  • stDelegatecallTestHomestead/test_delegatecall1024_oog.py (1 entry)

@leolara
Copy link
Copy Markdown
Member Author

leolara commented May 25, 2026

@spencer-tb I think the lint problem I am getting is from the base branch, on which branch should I rebase this?

@leolara
Copy link
Copy Markdown
Member Author

leolara commented May 26, 2026

devnets/bal/7 ported_static — progress after PR #2843

Snapshot date: 2026-05-26
Branch: wt-bal-7-amsterdam-oog-by-design (PR #2843)

Remaining after PR #2843 merges

Metric Count
Skip-list entries (nodeid substrings) 480
Skipped fixture variants on Amsterdam fill 1,197
Passing fixture variants on Amsterdam 17,733
Amsterdam pass rate on tests/ported_static/ 93.7%

Cumulative reduction across all my PRs

Stage Skip-list entries Fixture variants
Before PR #2790 (start of work) ~897 ~2,691
After PR #2796 + #2790 ~732 ~2,196
After PR #2839 554 ~1,662
After PR #2843 (this PR) 480 1,197
Net cleared by my work −417 ~−1,494

Notes

  • Skip-list entries are nodeid substrings; each typically expands to
    ~2–3 fixture variants (state_test,
    blockchain_test_from_state_test,
    blockchain_test_engine_from_state_test).
  • The 1,197 figure is the actual count uv run fill --fork Amsterdam
    reports as skipped.
  • For comparison: issue Static Test Fail Tracker for EIP-8037 #2601 quoted 3,423 failures on the
    pre-recalibration eips/amsterdam/eip-8037 branch. The 1,197 number
    on bal/7 is ~65% lower despite the CPSB-1530 recalibration adding
    new failure modes that didn't exist at the time Static Test Fail Tracker for EIP-8037 #2601 was filed.
  • Classification of the remaining 480 entries (OoG-by-design,
    gas-measurement, bytecode-baked, multi-param-per-d, balance-refund,
    size-limit, other) is in
    bal-7-skipped-tests-classification.md.

leolara added 2 commits May 26, 2026 22:15
… → devnets/bal/7 merge

The May-18 merge `dffc4cfea` ("Merge remote-tracking branch
'upstream/forks/amsterdam' into devnets/bal/7") had two conflict
resolutions that left `bal/7` in a state where `just static` fails:

1. `src/ethereum/forks/amsterdam/blocks.py` ended up with the EIP-7843
   `slot_number: U64` field declared twice (lines 260 and 268).
   `mypy` rejects it with `[no-redef]`; `ethereum-spec-lint` crashes
   with `ValueError: duplicate path Header.slot_number`. Remove the
   second copy.

2. `BuiltBlock.derive_engine_payload_modifier` was dropped from
   `packages/testing/src/execution_testing/specs/blockchain.py` while
   its 5 call sites in `specs/tests/test_types.py` were kept. `mypy`
   reports 5 `attr-defined` errors. Restore the staticmethod (and the
   `FixtureExecutionPayloadModifier` import it needs) from
   `forks/amsterdam`.

The instance-level wiring on `forks/amsterdam` (`BuiltBlock.rlp_modifier`
field, constructor pass-through, and `get_fixture_engine_new_payload`
call) is **not** restored — neither the linter nor the test file
references it, and `bal/7`'s current `get_fixture_engine_new_payload`
already runs without it.

This unblocks CI on every open PR against `devnets/bal/7` (including
this one).
@leolara
Copy link
Copy Markdown
Member Author

leolara commented May 26, 2026

@spencer-tb This commit here: 615745c is fixing something unrelated to this PR, but an error in bal/7 branch, that is causing the automatic tests to don't pass

@leolara
Copy link
Copy Markdown
Member Author

leolara commented May 28, 2026

@kclowes could you please check again?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants